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Abstract 

Median clustering is of great value for partitioning relational data. In this paper, a new prototype- 
based clustering method, called Median Evidential (7-Means (MECM), which is an extension of 
median c-means and median fuzzy c-means on the theoretical framework of belief functions is 
proposed. The median variant relaxes the restriction of a metric space embedding for the objects 
but constrains the prototypes to be in the original data set. Due to these properties, MECM could 
be applied to graph clustering problems. A community detection scheme for social networks based 
on MECM is investigated and the obtained credal partitions of graphs, which are more refined than 
crisp and fuzzy ones, enable us to have a better understanding of the graph structures. An initial 
prototype-selection scheme based on evidential semi-centrality is presented to avoid local premature 
convergence and an evidential modularity function is defined to choose the optimal number of 
communities. Finally, experiments in synthetic and real data sets illustrate the performance of 
MECM and show its difference to other methods. 

Keywords: Credal partition, Belief function theory, Median clustering, Community detection, 
Imprecise communities 


1. Introduction 

Cluster analysis or clustering is the task of partitioning a set of n objects X = {xi,x 2 , • • ■ , x n } 
into c small groups id = {cji, W 2 , • • • , w c } in such a way that objects in the same group (called a 
cluster) are more similar (in some sense or another, like characteristics or behavior) to each other 
than to those in other groups. The clustering can be used in many fields such as privacy preserving 
[24] , information retrieval [5], text analysis [47], etc. It can also be used as the first step of 
classification problems to identify the distribution of the training set [44]. Among the existing 
approaches to clustering, the objective function-driven or prototype-based clustering such as c- 
means and Gaussian mixture modeling is one of the most widely applied paradigms in statistical 
pattern recognition. These methods are based on a fundamentally very simple, but nevertheless 
very effective idea, namely to describe the data under consideration by a set of prototypes. They 
capture the characteristics of the data distribution (like location, size, and shape), and classify the 
data set based on the similarities (or dissimilarities) of the objects to their prototypes [6]. 

Generally, a c-partition of n objects in X is a set of nxc values {uij } arrayed as an n xc matrix 
U. Each element is the membership of Xi to cluster j. The classical (7-Means (CM) method 
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aims to partition n observations into c groups in which each observation belongs to the class with 
the nearest mean, serving as a prototype of the cluster. It results in ii tJ is either 0 or 1 depending 
whether object i is grouped into cluster j, and thus each data point is assigned to a single cluster 
(hard partitions). Fuzzy C-Means (FCM), proposed by Dunn [11] and later improved by Bezdek 
[3], is an extension of c-means where each data point can be a member of multiple clusters with 
membership values (fuzzy partitions) [25]. 

Belief functions have already been used to express partial information about data both in 
supervised and unsupervised learning [32, 41]. Recently, Masson and Denoeux [32] proposed the 
application of evidential c-means (ECM) to get credal partitions [9] for object data. The credal 
partition is a general extension of the crisp (hard) and fuzzy ones and it allows the object not only 
to belong to single clusters, but also to belong to any subsets of f 1 by allocating a mass of belief 
for each object in X over the power set 2 n . The additional flexibility brought by the power set 
provides more refined partitioning results than those by the other techniques allowing us to gain a 
deeper insight into the data [32]. 

All of these aforementioned partition approaches are prototype-based. In CM, FCM and ECM, 
the prototypes of clusters are the geometric centers of included data points in the corresponding 
groups. However, this may be inappropriate as it is the case in community detection problems for 
social networks, where the prototype (center) of one group is likely to be one of the persons ( i.e. 
nodes in the graph) playing the leader role in the community. That is to say, one of the points in 
the group is better to be selected as a prototype, rather than the center of all the points. Thus 
we should set some constraints for the prototypes, for example, let them be data objects. Actually 
this is the basic principle of median clustering methods [16]. These restrictions on prototypes 
can relax the assumption of a metric space embedding for the objects to be clustered [16, 18], 
and only similarity or dissimilarity between data objects is required. There are some clustering 
methods for relational data, such as Relational FCM (RFCM) [20] and Relational ECM (RECM) 
[33], but an underlying metric is assumed for the given dissimilarities between objects. However, 
in median clustering this restriction is dropped [16]. Cottrell et al. [8] proposed Median C-Means 
clustering method (MCM) which is a variant of the classic c-means and proved the convergence 
of the algorithm. Geweniger et al. [16] combined MCM with the fuzzy c-means approach and 
investigated the behavior of the resulted Median Fuzzy C-means (MFCM) algorithm. 

Community detection, which can extract specific structures from complex networks, has at¬ 
tracted considerable attention crossing many areas from physics, biology, and economics to soci¬ 
ology. Recently, significant progress has been achieved in this research field and several popular 
algorithms for community detection have been presented. One of the most popular type of classi¬ 
cal methods partitions networks by optimizing some criteria. Newman and Girvan [35] proposed 
a network modularity measure (usually denoted by Q ) and several algorithms that try to maxi¬ 
mize Q have been designed [4, 7, 10, 42]. But recent researches have found that the modularity 
based algorithms could not detect communities smaller than a certain size. This problem is fa¬ 
mously known as the resolution limit [14]. The single optimization criteria i.e. modularity may 
not be adequate to represent the structures in complex networks, thus Amiri et al. [1] suggested 
a new community detection process as a multi-objective optimization problem. Another family 
of approaches considers hierarchical clustering techniques. It merges or splits clusters according 
to a topological measure of similarity between the nodes and tries to build a hierarchical tree of 
partitions [27, 37, 43]. Also there are some ways, such as spectral methods [40] and signal process 
method [23, 26], to map topological relationship of nodes on networks into geometrical structures 


2 


of vectors in n-dimensional Euclidian space, where classical clustering methods like CM, FCM and 
ECM could be evoked. However, there must be some loss of accuracy after the mapping process. 
As mentioned before, for community detection, the prototypes should be some nodes in the graph. 
Besides, usually only dissimilarities between nodes are known to us. Due to the application of the 
relaxation on the data objects and the constraints on the prototypes, the median clustering could 
be applied to the community detection problem in social networks. 

In this paper, we extend the median clustering methods in the framework of belief functions 
theory and put forward the Median Evidential C'-Means (MECM) algorithm. Moreover, a com¬ 
munity detection scheme based on MECM is also presented. Here, we emphasize two key points 
different from those earlier studies. Firstly, the proposed approach could provide credal partitions 
for data set with only known dissimilarities. The dissimilarity measure could be neither symmetric 
nor fulfilling any metric requirements. It is only required to be of intuitive meaning. Thus it 
expands application scope of credal partitions. Secondly, some practical issues about how to apply 
the method into community detection problems such as how to determine the initial prototypes 
and the optimum community number in the sense of credal partitions are discussed. This makes 
the approach appropriate for graph partitions and gives us a better understanding of the analysed 
networks, especially for the uncertain and imprecise structures. 

The rest of this paper is organized as follows: Section 2 recalls the necessary background 
related to this paper. In Section 3, the median c-means algorithm is presented and in section 
4, we show how the proposed method could be applied in the community detection problem. In 
order to show the effectiveness of our approach, in section 5 we test our algorithm on artificial and 
real-world data sets and make comparisons with different methods. The final section makes the 
conclusions. 


2. Background 

2.1. Theory of belief functions 

Let fl = {wi,W 2 , • ■ ■, w c } be the finite domain of X , called the discernment frame. The mass 
function is defined on the power set 2 n = {A : A C H}. 

Definition 1. The function m : 2° — > [0,1] is said to be the Basic Belief Assignment (bba) on if 1 , 
if it satisfies: 

Y = L W 

Acn 

Every A £ 2^ such that m(A) > 0 is called a focal element. The credibility and plausibility functions 
are defined in Eq. (2) and Eq. (3). 

Bel(A) = Y m{B),VA CS1, (2) 

BCA,B^0 

Pl(A)= y m(B),V4C0. (3) 

snA^0 

Each quantity Bel(A) measures the total support given to A , while Pl(A) can be interpreted as 
the degree to which the evidence fails to support the complement of A. The function pi : S2 —>• [0,1] 
such that pl(u>i ) = PZ({wi}) (u>i £ H) is called the contour function associated with to. A belief 
function on the credal level can be transformed into a probability function by Smets method. In 
this algorithm, each mass of belief m(A) is equally distributed among the elements of A [39]. This 
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leads to the concept of pignistic probability, BetP , defined by 


BetP(u)i ) 


E 

uji $~2 


m(A) 

\m -m(0))’ 


( 4 ) 


where |A| is the number of elements of fl in A. 

Pignistic probabilities, which play the same role as fuzzy membership, can easily help us make 
a decision. In fact, belief functions provide us many decision-making techniques not only in the 
form of probability measures. For instance, a pessimistic decision can be made by maximizing 
the credibility function, while maximizing the plausibility function could provide an optimistic one 
[31]. Another criterion [31] considers the plausibility functions and consists in attributing the class 
Aj for object i if 

Aj = argmax{m-(A)P/i(X)}, (5) 

where 

m\{X) = K\\ x (Ep) . (6) 

In Eq. (5) m\(X) is a weight on PU(X ), and r is a parameter in [0,1] allowing a decision from a 
simple class (r = 1) until the total ignorance fl (r = 0). The value Ax allows the integration of 
the lack of knowledge on one of the focal sets X C 17, and it can be set to be 1 simply. Coefficient 
K\ is the normalization factor to constrain the mass to be in the closed world: 


K) = 


1 - rrii 


( 7 ) 


2.2. Median c-means and median fuzzy c-means 


Median c-means is a variant of the traditional c-means method [8, 16]. We assume that n 
(p-dimensional) data objects x r = {xn,Xi 2 , • • • , Xi P } (i = 1, 2, • • • , n) are given. The object set is 
denoted by A' = {x\ : X 2 , ■ ■ ■ , x n }. The objective function of MCM is similar to that in CM: 

c n 

Imcm = £E“«A (8) 

j=i i =l 

where c is the number of clusters. As MCM is based on crisp partitions, Uy is either 0 or 1 depending 
whether Xi is in cluster j. The value dij is the dissimilarity between Xi and the prototype vector 
Vj of cluster j ( i = 1 , 2, ■ ■ • , n, j = 1 , 2, ■ • • , c), which is not assumed to be fulfilling any metric 
properties but should reflect the common sense of dissimilarity. Due to these weak assumptions, 
data object Xi itself may be a general choice and it does not have to live in a metric space [16]. 
The main difference between MCM and CM is that the prototypes of MCM are restricted to the 
data objects. 

Median fuzzy c-means (MFCM) merges MCM and the standard fuzzy c-means (FCM). As 
in MCM, it requires the knowledge of the dissimilarity between data objects, and the prototypes 
are restricted to the objects themselves [16]. MFCM also performs a two-step iteration scheme to 
minimize the cost function 

c n 

Tmfcm = EE 44’ ( 9 ) 

1=1 i=1 
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subject to the constrains 


^2 Uik = 1 ,Vz G {1,2, • • • , n}, (10) 

k= 1 

and 

n 

> o, Vfc e { 1 , 2 , • • •, c}, (11) 

i=l 

where each number Ujfc G [0,1] is interpreted as a degree of membership of object i to cluster k, 
and /3 > 1 is a weighting exponent that controls the fuzziness of the partition. Again, MFCM is 
preformed by alternating update steps as for MCM: 


• Assignment update: 


u ij 


j-2/03-1) 

a ij 


E 


c 

k =1 


,-2/08-1)- 

a ik 


( 12 ) 


• Prototype update: the new prototype of cluster j is set to be Vj = Xi » with 


Xl* 


n 


= arg min 


ZX- 4- 


(13) 


2.3. Evidential c-means 

Evidential c-means [32] is a direct generalization of FCM in the framework of belief func¬ 
tions, and it is based on the credal partition first proposed by Denoeux and Masson [9]. The credal 
partition takes advantage of imprecise (rneta) classes [30] to express partial knowledge of class mem¬ 
berships. The principle is different from another belief clustering method put forward by Schubert 
[38] , in which conflict between evidence is utilised to cluster the belief functions related to multiple 
events. In ECM, the evidential membership of an object Xi = {xn, X&, ■ ■ ■ ,Xi p } is represented 
by a bba rrii = (rrii(Aj) : Aj C f l) over the given frame of discernment f l = {wi, ui 2 , • • • , w c }. The 
optimal credal partition is obtained by minimizing the following objective function: 

n n 

Jecm=^2 ^2 I A i |“m i (A j )^ + ^<5 2 m i (0) /3 (14) 

i=1 AjCn,Ajjt(/) i=l 

constrained on 

^ rrii(Aj)+ rrn(0) = 1, (15) 

where mi(Aj) = iriij is the bba of Xi given to the nonempty set Aj, while mj(0) = m.m is the bba of 
Xi assigned to the emptyset, and | • | is the cardinality of the set. Parameter a is a tuning parameter 
allowing to control the degree of penalization for subsets with high cardinality, parameter /3 is a 
weighting exponent and 8 is an adjustable threshold for detecting the outliers. It is noted that for 
credal partitions, j is not from 1 to c as before, but ranges in [0, /] with / = 2 C . Here dij denotes 
the Euclidean distance between Xi and the barycenter (i.e. prototype, denoted by Vj) associated 
with Aj: 

dij = ll^i ~ v j II , 

where Vj is defined mathematically by 

1 ° (1 

Vj = UA zL Sk i Vk ’ with Sk i = { 


if ujk & Ak 
else 


(16) 


(17) 
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The notation || • || denotes the Euclidean norm of a vector, and Vk is the geometrical center of all 
the points in cluster k. The update for ECM is given by the following two alternating steps and 
the update formulas can been obtained by Lagrange multipliers method. 


• Assignment update: 


| Aj .|-a/(/9-i) d -2/03—i) 


E l^|-“ /(/ 3 " 1 ) < _ fc 2/(/3 “ 1) + s 


,Vz = 1,2- 


yj/A.C^A^a (18) 


m i0 = 1 “ E = 1)2, • ,n. (19) 

Aj& 

• Prototype update: The prototypes (centers) of the classes are given by the rows of the matrix 
Vcxpt which is the solution of the following linear system: 

HV = B , (20) 

where H is a matrix of size (c x c) given by 

Hik = Y, E \Aj\ a - 2 m ^,k,l = 1,2,- ■■ ,c, (21) 

and B is a matrix of size (c x p ) defined by 

n 

Bi q =^x iq Y l^r 1 mJ J Z = l,2,.-- I c, 9 = 1,2, • • • ,p. (22) 

i =1 Aj3oji 

2-4- Some concepts for social networks 

In this work we will investigate how the proposed clustering algorithm could be applied to 
community detection problems in social networks. In this section some concepts related to social 
networks will be recalled. 

2-4-1. Centrality and dissimilarity 

The problem of assigning centrality values to nodes in graphs has been widely investigated 
as it is important for identifying influential nodes [48]. Gao et al. [15] put forward Evidential 
Semi-local Centrality (ESC) and pointed out that it is more reasonable than the existing centrality 
measures such as Degree Centrality (DC), Betweenness Centrality (BC) and Closeness Centrality 
(CC). In the application of ESC, the degree and strength of each node are first expressed by bbas, 
and then the fused importance is calculated using the combination rule in belief function theory. 
The higher the ESC value is, the more important the node is. The detail computation process of 
ESC can be found in [15]. 

The similarity or dissimilarity index signifies to what extent the proximity between two vertices 
of a graph is. The dissimilarity measure considered in this paper is the one put forward by Zhou [49]. 
This index relates a network to a discrete-time Markov chain and utilises the mean first-passage 
time to express the distance between two nodes. One can refer to [49] for more details. 
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2-4-2. Modularity 

Recently, many criteria were proposed for evaluating the partition of a network. A widely 
used measure called modularity, or Q was presented by Newman and Girvan [35]. Given a hard 
partition with c groups {u>i,u) 2 , • • ■ ,to c ) U = ( Ui k ) n xc , where Ui k is one if vertex i belongs to the 
kth community, 0 otherwise, and let the c crisp subsets of vertices be {Vi,V 2 , - ■ ■ ,14}, then the 
modularity can be defined as [13]: 



where \\W\\ = h = E"=i ™ij- The node subsets {14, k = 1,2, • • ■ ,c} are determined 

by the hard partition t/„ Xc , but the role of U is somewhat obscured by this form of modularity 
function. To reveal the role played by the partition U explicitly, Havens et al. [21] rewrote the 
equations in the form of U. Let k = (fci, k 2 , • • • , k n ) T , B = W — k T k/ ||W||, then 


Qh 


1 

W\\ 


EE 



i 

W\\ 


E U h Bu k 

k= 1 


W\\) 


= trace (U T BU) /||W||, 


'U'ik'Ujk 


(24) 


where u k = (u lk ,u 2 k, • ■ ■ ,u n t) T - 

Havens et al. [21] pointed out that an advantage of Eq. (24) is that it is well defined for any 
partition of the nodes not just crisp ones. The fuzzy modularity of U was derived as 


Q f = trace (U T BU) /\\W\\, 


(25) 


where U is the membership matrix and Ujj- represents the membership of community k for node i. 
If is restricted in [0,1], the fuzzy partition degrades to the hard one, and so Qf equals to Qh 
at this time. 


3. Median Evidential C-Means (MECM) approach 

We introduce here median evidential c-means in order to take advantages of both median 
clustering and credal partitions. Like all the prototype-based clustering methods, for MECM, an 
objective function should first be found to provide an immediate measure of the quality of clustering 
results. Our goal then can be characterized as the optimization of the objective function to get 
the best credal partition. 

3.1. The objective function of MECM 

To group n objects in X = {aq, x 2 , • • • , x n } into c clusters uii,ui 2 , • • • , u) c , the credal partition 
M = {mi, m 2 , • • • , m n } defined on 17 = {uq, ui 2 , • ■ ■ , w c } is used to represent the class membership 
of the objects, as in [9, 32]. The quantities mq = mi(Aj) (Aj 0, Aj C 17) are determined by the 
dissimilarity between object aq and focal set Aj which has to be defined first. 

Let the prototype set of specific (singleton) cluster be V = {vi,v 2 , • ■ • , v c }, where Vi is the 
prototype vector of cluster uq [i = 1, 2, • • • , c) and it must be one of the n objects. If |Aj \ = 1, i.e., 
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Aj is associated with one of the singleton clusters in Q (suppose to be uij with prototype vector 
Vj), then the dissimilarity between Xi and Aj is defined by 


du = d 2 {xi,Vj ), 


(26) 


where d(xi,Xj) represents the known dissimilarity between object and Xj. When \Aj\ > 1, 
it represents an imprecise (meta) cluster. If object x j is to be partitioned into a meta cluster, 
two conditions should be satisfied. One is the dissimilarity values between Xi and the included 
singleton classes’ prototypes are similar. The other is the object should be close to the prototypes 
of all these specific clusters. The former measures the degree of uncertainty, while the latter is to 
avoid the pitfall of partitioning two data objects irrelevant to any included specific clusters into 
the corresponding imprecise classes. Let the prototype vector of the imprecise cluster associated 
with Aj be Vj, then the dissimilarity between Xi and Aj can be defined as: 


7+ E d 2 (xi,v k ) + Pijmm{d(xi,v k ) : uj k G Aj} 

u k €Aj 

7+1 


(27) 


with 


Pij — 




E y \d(xi,v x ) -d(x i: v y )) 2 

,Uly£Aj 

V E d(v x ,v y ) 




(28) 


In Eq. (27) 7 weights the contribution of the dissimilarity of the objects from the consisted specific 
clusters and it can be tuned according to the applications. If 7 = 0, the imprecise clusters only con¬ 
sider our uncertainty. Discounting factor reflects the degree of uncertainty. If = 0, it means 
that all the dissimilarity values between Xi and the included specific classes in Aj are equal, and we 
are absolutely uncertain about which cluster object x t is actually in. Parameter 77 (G [0,1]) can be 
tuned to control of the discounting degree. In credal partitions, we can distinguish between “equal 
evidence” (uncertainty) and “ignorance”. The ignorance reflects the indistinguishability among the 
clusters. In fact, imprecise classes take both uncertainty and ignorance into consideration, and we 
can balance the two types of imprecise information by adjusting 7 . Therefore, the dissimilarity 
between Xi and + (+ ^ 0, Aj C D), <+■, can be calculated by 
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d 2 (xi,Vj) 


T37T 


E d 2 (xi,v k )+pij min{d(xi,v k ):uj k eAj} 

k' A J _ 

7+1 


l+l = i, 

l+l > 1 


(29) 


Like ECM, we propose to look for the credal partition M = {mi, m 2 , • • • , m„} G TZ nx2 and 
the prototype set V = {vi,V 2 , ■ ■ ■ ,v c } of specific (singleton) clusters by minimizing the objective 
function: 

n n 

Jmecm(M,V) = Y, E +r+4 + E 52 "4 (30) 

i=1 Aj<ZQ.,Aj^$ i=l 

constrained on 

E mij+m i q j = 1, (31) 

where = nii(Aj) is the bba of n, given to the nonempty set Aj, nrii 0 = Wi(0) is the bba of 
rii assigned to the empty set, and + is the dissimilarity between Xi and focal set +. Parameters 
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a, fi, S are adjustable with the same meanings as those in ECM. Note that Jmecm depends on the 
credal partition M and the set V of all prototypes. 


3.2. The optimization 


To minimize Jmecm, an optimization scheme via an Expectation-Maximization (EM) algo¬ 
rithm as in MCM [ 8 ] and MFCM [16] can be designed, and the alternate update steps are as 
follows: 

Step 1. Credal partition (. M ) update. 


• yAj C n, Aj ^ 0, 


mu = 




E \A t \-> 1) 


(32) 


• if Aj = 0 , 

TO i0 = 1 - m ij 


(33) 


Step 2. Prototype (V) update. 

The prototype Vi of a specific (singleton) cluster uii ( i = 1,2, ••• , c) can be updated first 
and then the dissimilarity between the object and the prototype of each imprecise (meta) clusters 
associated with subset Aj C il can be obtained by Eq. (29). For singleton clusters ujk (k = 
1 , 2 , • • • , c), the corresponding new prototypes Vk (k = 1 , 2 , • • • , c) are set to be sample xi orderly, 
with 

x l = argmin < L(v' k ) = ^ \ A i \ am iAj{v' k ), v' k G {*!, x 2 , ••■,*„} S , (34) 

v k ^ i=1 UkZAj J 

_ 2 / 

The dissimilarity between a;, and d^, is a function of which is the prototype of Wfc(S Aj), 
and it should be one of the n objects in X = {xi, X 2 , • • • , x n }. 

The bbas of the objects’ class membership are updated identically to ECM [32], but it is worth 
noting that dij has different meanings and less constraints as explained before. For the prototype 
updating process the fact that the prototypes are assumed to be one of the data objects is taken 
into account. Therefore, when the credal partition matrix M is fixed, the new prototypes of the 
clusters can be obtained in a simpler manner than in the case of ECM application. The MECM 
algorithm is summarized as Algorithm 1. 

The convergence of MECM algorithm can be proved in the following lemma, similar to the 
proof of median neural gas [ 8 ] and MFCM [16]. 

Lemma 1. The MECM algorithm (Algorithm 1) converges in a finite number of steps. 

Proof; Suppose flW = (M t , V t ) and 6^ +1 ) = (M t+1 , V) +1 ) are the parameters from two successive 
iterations of MECM. We will first prove that 

Jmecm(0^) > Jmecm(0^ +V> ), (35) 
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Algorithm 1 : 

Median evidential c-means algorithm 


Input 

dissimilarity matrix D = [d(xi,Xj)] nxn for the n objects {x 3l X 2 , ■ ■ 

5 

Parameters 

c: number clusters 1 < c < n 



a: weighing exponent for cardinality 
j3 > 1: weighting exponent 

S > 0 : dissimilarity between any object to the emptyset 

7 > 0 : weight of dissimilarity between data and prototype vectors 

77 € [ 0 , 1 ]: control of the discounting degree 


Initialization 

Choose randomly c initial cluster prototypes from the 

objects 

Loop 

t <- 0 



Repeat 

( 1 ) . t<-t + 1 

(2) . Compute M t using Eq. (32), Eq. (33) and Vt-i 

(3) . Compute the new prototype set V t using Eq. (34) 

Until the prototypes remain unchanged 



which shows MECM always monotonically decreases the objective function. Let 

n n 

J MECM = J2 J2 I“( m i? ) /3 K-? ) 2 + H 52 f 

i=l Aj-Cn,Aj ^0 i=l 

( 36 ) 

z j z 

where fi{M t ) = \Aj\ a (mf))^, / 2 (Vj) = (d^) 2 , and f 3 (M t ) = 5 2 (mfJ) p . ALt+i *s then obtained 
by maximizing the right hand side of the equation above. Thus, 

Jmecm > E E MM t+1 )h(V t ) + E h(M t+1 ) 

i j i 

> E E h(M t+ i)h(V t+1 ) + E h(M t+1 ) 

i j i 

_ t( 4 +1) 

— J MECM' 

This inequality (37) comes from the fact Mt.+i is determined by differentiating of the respective 
Lagrangian of the cost function with respect to M t . To get Eq. (38), we could use the fact that 
every prototype v k (k = 1 , 2 , ■ • • , c) in Vt+i is orderly chosen explicitly to be 

argmin l L(v' k ) = E E I A j\ am ij<%j( v 'k)> v k e {*i,* 2 , ■ • • ,x n } \ , 

^ 2=1 J 

and i/iws this formula evaluated at Vt+ 1 mast fee equal to or less than the same formula evaluated 
at V t . 

Hence MECM causes the objective function to converge monotonically. Moreover, the bba M 
is a function of the prototypes V and for given V the assignment M is unique. Because MECM 
assumes that the prototypes are original object data in X, so there is a finite number of different 
prototype vectors V and so is the number of corresponding credal partitions M. Consequently we 


(37) 

(38) 

(39) 


10 





can get the conclusion that the MECM algorithm converges in a finite number of steps. 


□ 


Remark 1. Although the objective function of MECM takes the same form as that in ECM 
[32], we should note that in MECM, it is no longer assumed that there is an underlying Euclidean 
distance. Thus the dissimilarity measure dij has few restrictions such as the triangle inequality 
or the symmetry. This freedom distinguishes the MECM from ECM and RECM, and it leads to 
the constraint for the prototypes to be data objects themselves. The distinct difference in the 
process of minimization between MECM and ECM lies in the prototype-update step. The purpose 
of updating the prototypes is to make sure that the cost function would decrease. In ECM the 
Lagrange multiplier optimization is evoked directly while in MECM a search method is applied. 
As a result, the objective function may decline more quickly in ECM as the optimization process 
has few constraints. However, when the centers of clusters in the data set are more likely to be the 
data object, MECM may converge with few steps. 

Remark 2. Although both MECM and MFCM can be applied to the same type of data 
set, they are very different. This is due to the fact that they are founded on different models of 
partitioning. MFCM provides fuzzy partition. In contrast, MECM gives credal partitions. We 
emphasize that MECM is in line with MCM and MFCM: each class is represented by a prototype 
which is restricted to the data objects and the dissimilarities are not assumed to be fulfilling any 
metric properties. MECM is an extension of MCM and MFCM in the framework of belief functions. 

3.3. The parameters of the algorithm 

As in ECM, before running MECM, the values of the parameters have to be set. Parameters 
a , /3 and S have the same meanings as those in ECM, and 7 weighs the contribution of uncertainty 
to the dissimilarity between nodes and imprecise clusters. The value /3 can be set to be (3 = 2 in 
all experiments for which it is a usual choice. The parameter a aims to penalize the subsets with 
high cardinality and control the amount of points assigned to imprecise clusters in both ECM and 
MECM. As the measures for the dissimilarity between nodes and rneta classes are different, thus 
different values of a should be taken even for the same data set. But both in ECM and MECM, 
the higher a is, the less mass belief is assigned to the meta clusters and the less imprecise will be 
the resulting partition. However, the decrease of imprecision may result in high risk of errors. For 
instance, in the case of hard partitions, the clustering results are completely precise but there is 
much more intendancy to partition an object to an unrelated group. As suggested in [32], a value 
can be used as a starting default one but it can be modified according to what is expected from 
the user. The choice S is more difficult and is strongly data dependent [32]. 

For determining the number of clusters, the validity index of a credal partition defined by 
Masson and Denoeux [32] could be utilised: 



(40) 


where 0 < N*(c) < 1. This index has to be minimized to get the optimal number of clusters. When 
MECM is applied to community detection, a different index is defined to determine the number of 
communities. We will describe it in the next section. 
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4. Application and evaluation issues 


In this section, we will discuss how to apply MECM to community detection problems in social 
networks and how to evaluate credal partitions. 

4-1- Evidential modular function 

Assume the obtained credal partition of the graph is 


M = [mi, m 2 , • • ■ , m n ] 


T 


where m; = (mii,mj 2 , • • • ,TOi C ) T . Similarly to the fuzzy modularity by Havens et al. [21], here we 
introduce an evidential modularity [50]: 


Qe 


1 

W\\ 


EE 



kikj \ 

\\W\\) 


plikpljki 


(41) 


where pli = {pln,pli 2 , • • • ,pli C ) T is the contour function associated with nn, which describes the 
upper value of our belief to the proposition that the ith node belongs to the kth community. 

Let PL = (pUk)nxcj then Eq. (41) can be rewritten as: 


Qe 


trac e(PL T B PL) 

W\\ 


(42) 


Q e is a directly extension of the crisp and fuzzy modularity functions in Eq. (24). When the credal 
partition degrades into the hard and fuzzy ones, Q e is equal to Qh and Qf respectively. 


4-2. The initial prototypes for communities 

Generally speaking, the person who is the center in the community in a social network has the 
following characteristics: he has relation with all the members of the group and their relationship 
is stronger than usual; he may directly contact with other persons who also play an important role 
in their own community. For instance, in Twitter network, all the members in the community of 
the fans of Real Madrid football Club (RMC) are following the official account of the team, and 
RMC must be the center of this community. RMC follows the famous football player in the club, 
who is sure to be the center of the community of his fans. In fact, RMC has 10382777 followers 
and 30 followings (the data on March 16, 2014). Most of the followings have more than 500000 
followers. Therefore, the centers of the community can be set to the ones not only with high degree 
and strength, but also with neighbours who also have high degree and strength. Thanks to the 
theory of belief functions, the evidential semi-local centrality ranks the nodes considering all these 
measures. Therefore the initial c prototypes of each community can be set to the nodes with largest 
ESC values. 

Note that there is usually more than one center in one community. Take Twitter network 
for example again, the fans of RMC who follow the club official account may also pay attention 
to Cristiano Ronaldo, the most popular player in the team, who could be another center of the 
community of RMC’s fans to a great extent. These two centers (the accounts of the club and 
Ronaldo) both have large ESC values but they are near to each other. This situation violates the 
rule which requires the chosen seeds as far away from each other as possible [2, 26]. 

The dissimilarity between the nodes could be utilised to solve this problem. Suppose the 
ranking order of the nodes with respect to their ESCs is ni > n 2 > • • • > n n . In the beginning n± 
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is set to be the first prototype as it has the largest ESC, and then node 712 is considered. If d(ni, 712 ) 
(the dissimilarity between node 1 and 2) is larger than a threshold //, it is chosen to be the second 
prototype. Otherwise, we abandon ri 2 and turn to check 713 . The process continues until all the c 
prototypes are found. If there are not enough prototypes after checking all the nodes, we should 
decrease /z moderately and restart the search from n\. In this paper we test the approach with the 
dissimilarity measure proposed in [49]. Based on our experiments, [0.7,1] is a better experiential 
range of the threshold /i. This seed choosing strategy is similar to that in [26]. 


f.3. The community detection algorithm based on MECM 

The whole community detection algorithm in social networks based on MECM is summarized 
in Algorithm 2. 


Algorithm 2 : Community detection algorithm based on MECM 

Input: A, the adjacency matrix; W , the weight matrix (if any); /z, the threshold controlling 
the dissimilarity between the prototypes; c m i n , the minimal number of communities; c max , the 
maximal of communities; the required parameters in original MECM algorithm 
Initialization: Calculate the dissimilarity matrix of the nodes in the graph, 
repeat 

(1) . Set the cluster number c in MECM be c = c m i n . 

(2) . Choose the initial c prototypes using the strategy proposed in section 4.2. 

(3) . Run MECM with the corresponding parameters and the initial prototypes got in (2). 

(4) . Calculate the evidential modularity using Eq. (42). 

(5) . Let c = c + 1. 
until c reaches at c max . 

Output: Choose the number of communities at around which the modular function peaks, and 
output the corresponding credal partition of the graph. 


In the algorithm, c m ; n and c max can be determined based on the original graph. Note that 
Cmin > 2. It is an empirical range of the community number of the network. If c is given, we can 
get a credal partition based on MECM and then the evidential modularity can be derived. As we 
can see, the modularity is a function of c and it should peak at around the optimal value of c for 
the given network. 

4-4- Performance evaluation 

The objective of the clustering problem is to partition a similar data pair to the same group. 
There are two types of correct decisions by the clustering result: a true positive (TP) decision 
assigns two similar objects to the same cluster, while a true negative (TN) decision assigns two 
dissimilar objects to different clusters. Correspondingly, there are two types of errors we can 
commit: a false positive (FP) decision assigns two dissimilar objects to the same cluster, while a 
false negative (FN) decision assigns two similar objects to different clusters. Let a (respectively, b) 
be the number of pairs of objects simultaneously assigned to identical classes (respectively, different 
classes) by the stand reference partition and the obtained one. Actually a (respectively, b) is the 
number of TP (respectively, TN) decisions. Similarly, let c and d be the numbers of FP and FN 
decisions respectively. Two popular measures that are typically used to evaluate the performance 
of hard clusterings are precision and recall. Precision (P) is the fraction of relevant instances (pairs 
in identical groups in the clustering benchmark) out of those retrieved instances (pairs in identical 
groups of the discovered clusters), while recall (R) is the fraction of relevant instances that are 
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retrieved. Then precision and recall can be calculated by 


P = 


a 

a + c 


and R = -- 

a + a 


( 43 ) 


respectively, 
defined as 


The Rand index (RI) measures the percentage of correct decisions and it can be 


2 (a + b) 




n(n — 1) ’ 


(44) 


where n is the number of data objects. In fact, precision measures the rate of the first type of 
errors (FP), recall (R) measures another type (FN), while RI measures both. 


For fuzzy and evidential clusterings, objects may be partitioned into multiple clusters with 
different degrees. In such cases precision would be consequently low [34]. Usually the fuzzy and 
evidential clusters are made crisp before calculating the measures, using for instance the maximum 
membership criterion [34] and pignistic probabilities [32]. Thus in the work presented in this paper, 
we have hardened the fuzzy and credal clusters by maximizing the corresponding membership and 
pignistic probabilities and calculate precision, recall and RI for each case. 


The introduced imprecise clusters can avoid the risk to group a data into a specific class 
without strong belief. In other words, a data pair can be clustered into the same specific group 
only when we are quite confident and thus the misclassification rate will be reduced. However, 
partitioning too many data into imprecise clusters may cause that many objects are not identified 
for their precise groups. In order to show the effectiveness of the proposed method in these aspects, 
we use the evidential precision (EP) and evidential recall (ER): 


n Pr n P r 

EP=—, ER = — 

N e ’ N r 


(45) 


In Eq. (45) , the notation N e denotes the number of pairs partitioned into the same specific group 
by evidential clusterings, and n er is the number of relevant instance pairs out of these specifically 
clustered pairs. The value N r denotes the number of pairs in the same group of the clustering 
benchmark, and ER is the fraction of specifically retrieved instances (grouped into an identical 
specific cluster) out of these relevant pairs. When the partition degrades to a crisp one, EP and 
ER equal to the classical precision and recall measures respectively. EP and ER reflect the accuracy 
of the credal partition from different points of view, but we could not evaluate the clusterings from 
one single term. For example, if all the objects are partitioned into imprecise clusters except two 
relevant data object grouped into a specific class, EP = 1 in this case. But we could not say this 
is a good partition since it does not provide us with any information of great value. In this case 
ER 0. Thus ER could be used to express the efficiency of the method for providing valuable 
partitions. Certainly we can combine EP and ER like RI to get the evidential rank index (ERI) 
describing the accuracy: 


ERI = 


2 (a* + b*) 
n(n — 1) 


(46) 


where a* (respectively, b *) is the number of pairs of objects simultaneously clustered to the same 
specific class (i.e., singleton class, respectively, different classes) by the stand reference partition 
and the obtained credal one. Note that for evidential clusterings, precision, recall and RI measures 
are calculated after the corresponding hard partitions are got, while EP, ER and ERI are based on 
hard credal partitions [32]. 


Example 1. In order to show the significance of the above performance measures, an example 
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containing only ten objects from two groups is presented here. The three partitions are given in 
Fig. 1-b - 1-d. The values of the six evidential indices (P,R,RI,EP,ER,ERI) are listed in Tab. 1. 







Figure 1: An small data set with imprecise classes. 


Table 1: Evaluation Indices of the partitions. 



EP 

ER 

ERI 

P 

R 

RI 

Partition 1 

0.6190 

0.6190 

0.6444 

0.6190 

0.6190 

0.6444 

Partition 2 

1.0000 

0.6190 

0.8222 

- 

- 

- 

Partition 3 

1.0000 

0.0476 

0.5556 

- 

- 

- 


We can see that if we simply partition the nodes in the overlapped area, the risk of misclassification 
is high in terms of precision. The introduced imprecise cluster u> 12 = {aJi,W 2 } could enable us to 
make soft decisions, as a result the accuracy of the specific partitions is high. However, if too many 
objects are clustered into imprecise classes, which is the case of partition 3, it is pointless although 
EP is high. Generally, EP denotes the accuracy of the specific decisions, while ER represents the 
efficiency of the approach. We remark that the evidential indices degrade to the corresponding 
classical indices ( e.g, evidential precision degrades to precision) when the partition is crisp. 

5. Experiments 

In this section a number of experiments are performed on classical data sets in the distance 
space and on graph data for which only the dissimilarities between nodes are known. The obtained 
credal partitions are compared with hard and fuzzy ones using the evaluation indices proposed in 
Section 4.4 to show the merits of MECM. 

5.1. Overlapped data set 

Clustering approaches to detect overlap objects which leads to recent attentions are still in¬ 
efficiently processed. Due to the introduction of imprecise classes, MECM has the advantage to 
detect overlapped clusters. In the first example, we will use overlapped data sets to illustrate the 
behavior of the proposed algorithm. 
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We start by generating 2 x 100 points uniformly distributed in two overlapped circles with a 
same radius R = 30 but with different centers. The coordinates of the first circle’s center are (0, 0) 
while the coordinates of the other circle’s center are (30, 30). The data set is displayed in Fig. 2-a. 




a. Overlapped data b. Clustering result of MECM 

Figure 2: Clustering of overlapped data set 


In order to show the influence of parameters in MECM and ECM, different values of 7 , a, 77 
and S have been tested for this data set. The figure Fig. 3-a displays the three evidential indices 
varying with 7 (a is fixed to be 2) by MECM, while Fig. 3-b depicts the results of MECM with 
different a but a fixed 7 = 0.4 (77 and S are set 0.7 and 50, respectively, in the tests). For fixed 
a and 7 , the results with different 77 and S are shown in Fig. 3-c. The effect of a and 5 on the 
clusterings of ECM is illustrated in Fig. 3-e. As we can see, for both MECM and ECM, if we want 
to make more imprecise decisions to improve ER, parameter a can be decreased. In MECM, we 
can also reduce the value of parameter 7 to accomplish the same purpose. Although both a and 
7 have effect on imprecise clusters in MECM, the mechanisms they work are different. Parameter 
a tries to adjust the penalty degree to control the imprecise rates of the results. However, for 7 , 
the same aim could be got by regulating the uncertainty degree of imprecise classes. It can be 
seen from the figures, the effect of 7 is more conspicuous than a. Moreover, although a may be 
set too high to obtain good clusterings, “good” partitions can also be got by adjusting 7 in this 
case. For both MECM and ECM, the stable limiting values of evidential measures are around 0.7 
and 0.8. Such values suggest the equivalence of the two methods to a certain extent. Parameter 
?7 is used for discounting the distance between uncertain objects and specific clusters. As pointed 
out in Fig. 3-c, if 7 and a are well set, it has little effect on the final clusterings. The same is true 
in the case of 6 which is applied to detect outliers. The effect of the different values of parameter 
/3 is illustrated in Fig. 3-d. We can see that it has little influence on the final results as long as 
it is larger than 1. As in FCM and ECM, for which it is a usual choice, we use /3 = 2 in all the 
following experiments. 

The improvement of precision will bring about the decline of recall, as more data could not 
be clustered into specific classes. What we should do is to set parameters based on our own 
requirement to make a tradeoff between precision and recall. For instance, if we want to make a 
cautious decision in which EP is relatively high, we can reduce 7 and a. Values of these parameters 
can be also learned from historical data if such data are available. 
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c. 


a. MECM (with respect to 7 ) b. 

MECM (with respect to 7 or 5) d. 

e. ECM (with respect to a or 5) 


MECM (with respect to a) 
MECM (with respect to /3) 


Figure 3: Clustering of overlapped data set with different parameters 


For the objects in the overlapped area, it is difficult to make a hard decision i.e. to decide 
about their specific groups. Thanks to the imprecise clusters, we can make a soft decision. As 
analysed before, the soft decision will improve the precision of total results and reduce the risk of 
misclassifications caused by simply partitioning the overlapped objects into specific class. However, 
too many imprecise decisions will decrease the recall value. Therefore, the ideal partition should 
make a compromise between the two measures. Set a = 1 . 8,7 = 0-2,?? = 0.7 and 8 = 50, the 
“best” (with relatively high values on both precision and recall) clustering result by MECM is 
shown in Fig. 2-b. As we can see, most of the data in the overlapped area are partitioned into 
imprecise cluster u \2 = {wi,^} by the application of MECM. We adjust the coordinates of the 
center of the second circle to get overlapped data with different proportions (overlap rates), and 
the validity indices of the clustering results by different methods are illustrated in Fig. 4. For 
the application of MECM, MCM and MFCM, each algorithm is evoked 20 times with randomly 
selected initial prototypes for the same data set and the mean values of the evaluating indices are 
reported. The figure Fig. 4-d shows the average values of the indices by MECM (plus and minus 
one standard deviation) for 20 repeated experiments as a function of the overlap rates. As we can 
see the initial prototypes indeed have effects on the final results, especially when the overlap rates 
are high. Certainly, we can avoid the influence by repeating the algorithm many times. But this 
is too expensive for MECM. Therefore, we suggest to use the prototypes obtained in MFCM or 
MCM as the initial. In the following experiments, we will set the initial prototypes to be the ones 
got by MFCM. 

As it can be seen, for different overlap rates, the classical measures such as precision, recall, 
and RI are almost the same for all the methods. This reflects that pignistic probabilities play a 
similar role as fuzzy membership. But we can see that for MECM, EP is significantly high, and the 
increasing of overlap rates has least effects on it compared with the other methods. Such effect can 
be attributed to the introduced imprecise clusters which enable us to make a compromise decision 
between hard ones. But as many points are clustered into imprecise classes, the evidential recall 
value is low. 

Overall, this example reflects one of the superiority of MECM that it can detect overlapped 
clusters. The objects in the overlapped area could be clustered into imprecise classes by this 
approach. Other possible available information or special techniques could be utilised for these 
imprecise data when we have to make hard decisions. Moreover, partitions with different degree of 
imprecision can be got by adjusting the parameters of the algorithm based on our own requirement. 

a. Precision b. Recall 

c. RI d. Repeated experiments for MECM 

Figure 4: Clustering of overlapped data set with different overlap rates. Figure-d shows the average values of the 
indices (plus and minus one standard deviation) for 20 repeated experiments, as a function of the overlap rates. For 
MECM a = 1.8 ,7 = 0.2 ,77 = 0.7, <5 = 50. 
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5.2. Classical data sets from Gaussian mixture model 


In the second experiment, we test on a data set consisting of 3 x 50 + 2 x 5 points generated 
from different Gaussian distributions. The first 3 x 50 points are from Gaussian distributions 
G(/i fc ,Sfc)(fc = 1,2,3) with 


Mi =(0,0 ) t ,/x 2 = (40,40) t ,/x 3 = (80,80) T 


(47) 



and the last 2 x 10 data are noisy points follow G(fik, Sfe)(fc = 4,5) with 


M 4 =(-50,90) t ,/x 5 = (—10, 130) t 


(48) 



MECM is applied with the following settings: a = 1, <5 = 100 ,77 = 0.7, while ECM has been 
tested using a = 1.7, S = 100 (The appropriate parameters can be determined similarly as in the 
first example). One can see from Figs. 5-b and 5-c , MCM and MFCM can partition most of the 
regular data in to 1 , W 2 and w 3 into their correct clusters, but they could not detect the noisy points 
correctly. These noisy data are simply grouped into a specific cluster by both approaches. As can 
be seen from Fig. 5-d, for the points located in the middle part of ui 2 , ECM could not find their 
exact group and misclassify them into imprecise cluster wi 3 . In the figures u>ij = {u>i,u>j} denotes 
imprecise clusters. 

As mentioned before, imprecise classes in MECM can measure ignorance and uncertainty at 
the same time, and the degree of ignorance in meta clusters can be adjusted by 7 . We can see that 
MECM does not detect many points in the overlapped area between two groups if 7 is set to 0.6. In 
such a case the test objects are partitioned into imprecise clusters mainly because of our ignorance 
about their specific classes. These objects attributed to meta classes mainly belong to noisy data 
in W 4 and W 5 . The distance of these points to the prototypes of specific clusters is large (but not too 
large or they could be regarded to be in the emptyset, see Fig. 5-e). Thus the distance between the 
prototype vectors is relatively small so that these specific clusters are indistinguishable. Decreasing 
7 to be 0.2 would make imprecise class denoting more uncertainty, as it can be seen from Fig. 5-f, 
where many points located in the margin of each group are clustered into imprecise classes. In 
such a case, meta classes rather reflect our uncertainty on the data objects’ specific cluster. 

The table Tab. 2 lists the indices for evaluating the different methods. Bold entries in each 
column of this table (and also other tables in the following) indicate that the results are significant 
as the top performing algorithm(s) in terms of the corresponding evaluation index. We can see 
that the precision, recall and RI values for all approaches are similar except from those obtained 
for ECM which are significantly lower. As these classical measures are based on the associated 
pignistic probabilities for evidential clusterings, it seems that credal partitions can provide the same 
information as crisp and fuzzy ones. But from the same table, we can also see that the evidential 
measures EP and ERI obtained for MECM are higher (for hard partitions, the values of evidential 
measures equal to the corresponding classical ones) than the ones obtained for other methods. This 
fact confirms the accuracy of the specific decisions i.e. decisions clustering the objects into specific 
classes. The advantage can be attributed to the introduction of imprecise clusters, with which we 
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do not have to partition the uncertain or unknown objects into a specific cluster. Consequently, 
it could reduce the risk of misclassification. However, although ECM also deals with imprecise 
clusters, the accuracy is not improved as much as in the case of applying MECM. As illustrated 
before in the case of ECM application, many objects of a specific cluster are partitioned into an 
irrelevant imprecise class and, as a result, the evidential precision value and ERI decrease as well. 


a. Original artificial data 
c. Clustering result of MFCM 
e. Clustering result of MECM (7 = 0.6) 


b. Clustering result of MCM 
d. Clustering result of ECM 
f. Clustering result of MECM (7 = 0.2) 


Figure 5: Clustering of the artificial data set by different methods 


Table 2: The clustering results for gaussian points by different methods. For each method, we generate 20 data sets 
with the same parameters and report the mean values of the evaluation indices for all the data sets. 



Precision 

Recall 

RI 

EP 

ER 

ERI 

MCM 

0.7802 

0.9570 

0.9002 

0.7802 

0.9570 

0.9002 

MFCM 

0.8616 

0.9797 

0.9484 

0.8616 

0.9797 

0.9484 

FCM 

0.8644 

0.9820 

0.9500 

0.8644 

0.9820 

0.9500 

ECM 

0.8215 

0.9353 

0.9222 

0.9069 

0.8436 

0.9294 

MECM (7 = 0.2) 

0.8674 

0.9855 

0.9520 

0.9993 

0.7721 

0.9336 

MECM (7 = 0.6) 

0.8662 

0.9851 

0.9515 

0.9958 

0.9586 

0.9868 


We also test on “Iris flower”, “cat cortex” and “protein” data sets [12, 17, 22]. The first is 
object data while the other two are relational data sets. Thus we compare our method with FCM 
and ECM for the Iris data set, and with RECM and NRFCM (Non-Euclidean Relational Fuzzy 
Clustering Method [19]) for the last two data sets. The results are displayed in Fig. 6 . 

Presented results allow us to sum up the characteristics of MECM. Firstly, one can see that 
the behavior of MECM is similar to ECM for traditional data. Besides, credal partitions provided 
by MECM allow to recover the information of crisp and fuzzy partitions. Moreover, we are able to 
balance influence of our uncertainty and ignorance according to the actual needs. The examples 
utilised before deal with classical data sets. But the superiority of MECM makes it applicable in 
the case of data sets for which only dissimilarity measures are known e.g. social networks. Thus in 
the following experiments, we will use some graph data to illustrate the behaviour of the proposed 
method on the community detection problem in social networks. The dissimilarity index used here 
is the one brought forward by Zhou [49]. To have a fair comparison, in the following experiments, 
we also compare with three classical algorithms for community detection i.e. BGLL [4], LPA 
[36] and ZFCM (a fuzzy c-means based approach proposed by Zhang et al. [46]). The obtained 
community structures are compared with known performance measures, i.e., NMI (Normalized 
Mutual Information), VI (Variation of Information) and Modularity. 


a. Precision b. Recall 

c. RI 

Figure 6: Clustering results of different data sets 
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5.3. Artificial graphs and generated benchmarks 

To show the performance of the algorithm in detecting communities in networks, we first apply 
the method to a sample network generated from Gaussian mixture model. This model has been 
used for testing community detection approaches by Liu and Liu [29]. 

The artificial graph is composed of 3 x 50 nodes, ni,ri 2 , ■ • • , n. 150 , which are represented by 
150 sample points, x\, X 2 , • • • , ®i 5 o, in two-dimensional Euclidean space. There are 3 x 50 points 
generated from Gaussian distributions G(p,k, Sfc)(£; = 1, 2,3) with 


Mi = (1.0,4.0) t ,M2 = (2.5,5.5) t ,M3 = (0.5,6.0) T 



(49) 


Then, the edges of the graph are generated by the following thresholding strategy: if \xi~Xj \ < dist , 
we set an edge between node i and node j ; Otherwise the two nodes are not directly connected. The 
graph is shown in Fig. 7-a (with dist = 0.8) and the dissimilarity matrix of the nodes is displayed 
in Fig. 7-b. From the figures we can see that there are three significant communities in the graph, 
and some nodes in the bordering of their groups seem to be in overlapped classes as they contact 
with members in different communities simultaneously. 

The table Tab. 3 lists the indices for evaluating the results. It shows that MECM performs 
well as the evidential precision resulting from its application is high. MECM utilization also results 
in decreasing the probabilities of clustering failure thanks to the introduction of imprecise clusters. 
This makes the decision-making process more cautious and reasonable. 


a. Gaussian network b. Dissimilarity matrix 

Figure 7: Artificial network from Gaussian mixture model 


Table 3: The results for Gaussian graph by different methods 



Precision 

Recall 

RI 

EP 

ER 

ERI 

NMI 

VI 

Modularity 

MCM 

0.9049 

0.9110 

0.9392 

0.9049 

0.9110 

0.9392 

0.8282 

0.3769 

0.6100 

MFCM 

0.9067 

0.9099 

0.9396 

0.9067 

0.9099 

0.9396 

0.8172 

0.4013 

0.6115 

ZFCM 

0.9202 

0.9224 

0.9482 

0.9202 

0.9224 

0.9482 

0.8386 

0.3545 

0.6118 

MECM 

0.9470 

0.9472 

0.9652 

0.9789 

0.6060 

0.8661 

0.8895 

0.2428 

0.6072 

BGLL 

0.9329 

0.9347 

0.9564 

0.9329 

0.9347 

0.9564 

0.8597 

0.3081 

0.6119 

LPA 

0.3289 

1.0000 

0.3289 

0.3289 

1.0000 

0.3289 

0.0000 

1.0986 

0.0000 


The algorithms are also compared by means of Lancichinetti et al. [28] benchmark (LFR) 
networks. The results of different methods in two kinds of LFR networks with 500 and 1000 nodes 
are displayed in Figs. 8-9 respectively. The parameter p showed in the cc-axis in the figures identifies 
whether the network has clear communities. When p is small, the graph has well community 
structure. In such a case, almost all the methods perform well. But we can see that when p is 
large, the results by MECM have the largest values of precision. It means that the decisions which 
partition the nodes into a specific cluster are of great confidence. In terms of NMI, the results are 
similar to those by BGLL and LPA, but better than those of MCM and MFCM. This fact well 
explains that the hard or fuzzy partitions could be recovered when necessary. 
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a. Precision 


b. NMI 


Figure 8 : Comparison of MECM and other algorithms in LFR networks. The number of nodes is N = 500. The 
average degree is |fc| = 15, and the pair for the exponents is ( 7 , /3) = (2, 1). 


a. Precision 


b. NMI 


Figure 9: Comparison of MECM and other algorithms in LFR networks. The number of nodes is IV = 1000. The 
average degree is \k\ = 20 , and the pair for the exponents is ( 7 , ft) = ( 2 , 1 ). 


5-4■ Some real-world networks 

A. Zachary’s karate club. The Zachary’s Karate Club data [45] is an undirected graph 
which consists of 34 vertices and 78 edges. The original graph and the dissimilarity of the nodes 
are shown in Fig. 10-a and 10-b respectively. 

Let the parameters of MECM be a = 1.5, 8 = 100, 77 = 0.9 ,7 = 0.6. The modularity functions 
by MECM, MCM, MFCM and ZFCM (Fig. 12-a) peak around c = 2 and c = 3. Let c = 2, all the 
methods can detect the two communities exactly. If we set c = 3, a small community, which can 
also be found in the dissimilarity matrix (Fig. 10-b), is separated from <*7 by all the approaches 
(see Fig. 11). But ZFCM assigns the maximum membership to 07 for node 9, which is actually in 
u> 2 - It seems that the loss of accuracy in the mapping process may cause such results. 

MECM does not find imprecise groups when 7 = 0.6 as the network has apparent community 
structure, and this reflects the fact that the communities are distinguishable for all the nodes. 
But there may be some overlap between two communities. The nodes in the overlapped cluster 
can be detected by decreasing 7 (increasing the uncertainty for imprecise communities). As is 
displayed in Fig. 11-c and d, by declining 7 to 0.1 and 0.05 respectively (the other parameters 
remain unchanged), nodes 3 and 9 are clustered into both 07 and 0 J 2 ( 072 ) one after another. 

From the results we can see that MECM takes both the ignorance and the uncertainty into 
consideration while introducing imprecise communities. The degree of ignorance and uncertainty 
could be balanced through adjusting 7 . The analysis shows that there appears only uncertainty 
without ignorance in the original club network. In order to show the performance of MECM when 
there are noisy conditions such that some communities are indistinguishable, two noisy nodes are 
added to the original graph in the next experiment. 


a. Original karate club network 


b. Dissimilarity matrix 


Figure 10: Original karate club network 


a. Clustering result of ZFCM 
c. Clustering result of MECM (7 = 0.1) 


b. Clustering result of MCM, MFCM, and 
MECM (7 = 0.1) 

d. Clustering result of MECM (7 = 0.05) 


Figure 11: Detected communities of karate club network by different methods 


a. Original karate club 


b. Karate club with noisy nodes 


Figure 12: Modularity functions of karate club network by different methods 
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B. Karate club network with some added noisy nodes. In this test, two noisy nodes 
are added to the original karate club network (see Fig.l3-a). The first one is node 35, which is 
directly connected with nodes 18 and 27. The other one is 36, which is connected to nodes 1 and 
33. It can be seen from the dissimilarity matrix that node 36 has stronger relationships with both 
communities than node 35. This is due to the fact that the nodes connected to node 36 play leader 
roles in their own group, but node 35 contacts with two marginal nodes with “small” or insignificant 
roles in their own group only. 

The results obtained by the application of different methods are shown in Fig. 14. The MECM 
parameters are set as follows: a = 1.5, S = 100, ij = 0.9 and 7 is tuned according to the the extent 
that the imprecise communities reflect our ignorance. As we can see, MCM, MFCM and ZFCM 
simply group the two noisy nodes into 07 . With 7 = 0.4 MECM regards node 36 as a member of 
07 while node 35 is grouped into imprecise community 072 . And w 12 mainly reflects our ignorance 
rather than uncertainty on the actual community of node 36. This is why node 36 is not clustered 
into u> 12 since 07 and w 2 are distinguishable for him but we are just not sure for the final decision. 
The increase in the extent of uncertainty in imprecise communities results from the decrease of 7 
value. We can see that more nodes (including nodes 36,9,1,12,27, see Figs. 14-e and f) are clustered 
into W 12 or 073 due to uncertainty. The imprecise communities consider both ignorance (node 35) 
and uncertainty (other nodes). 

These results reflect the difference between ignorance and uncertainty. As node 35 is only 
related to one outward node of each community, thus we are ignorant about which community it 
really belongs to. On the contrary, node 36 connects with the key members (playing an important 
role in the community), and in this case the dissimilarity between the prototypes of uj 1 and w 2 is 
relatively large so they are distinguishable. Thus there is uncertainty rather than ignorance about 
which community node 36 is in. In this network, node 36 is a “good” member for both communities, 
whereas node 35 is a “poor” member. It can be seen from Fig. 15-a that the fuzzy partition by 
MFCM also gives large similar membership values to 07 and 07 for node 35, just like in the case 
of such good members as node 36 and 9. The obtained results show the problem of distinguishing 
between ignorance and the “equal evidence” (uncertainty) for fuzzy partitions. But Fig. 15-b shows 
that the credal partition by MECM assigns small mass belief to 07 and 0 J 2 for node 35, indicating 
our ignorance on its situation. 

a. Karate club network with added nodes b. Dissimilarity matrix 

Figure 13: Karate club network with two noisy nodes 


a. MCM 
c. MFCM 
e. MECM (7 = 0.1) 


b. ZFCM 

d. MECM (7 = 0.4) 
f. MECM (7 = 0.02) 


Figure 14: Detected communities in Karate club network with noisy nodes 


a. Fuzzy membership by MFCM b. Mass belief by MECM 

Figure 15: Fuzzy membership and mass belief of the nodes in karate club network with noisy nodes 
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We also test our method on four other real-world graphs: American football network, Dolphins 
network, Lesmis network and Political books network. The measures applied to evaluate the 
performance of different methods are listed in Tab. 4-7. It can been seen from the tables, for 
all the graphs MECM application results in a community structure with high evidential precision 
level. The precision results from a cautious decision making process which clusters the noisy 
nodes into imprecise communities. In terms of classical performance measures like NMI, VI and 
modularity, MECM slightly outperforms the other algorithms. Note that these classical measures 
for hard partitions are calculated by the pignistic probabilities associated with the credal partitions 
provided by MECM. Therefore, we can also see the possibility to recover the hard decisions here 
when using the proposed evidential detection approach. 


Table 4: The results for American football network by different methods 



Precision 

Recall 

RI 

EP 

ER 

ERI 

NMI 

VI 

Modularity 

MCM 

0.7416 

0.8834 

0.9661 

0.7416 

0.8834 

0.9661 

0.8637 

0.6467 

0.5862 

MFCM 

0.7583 

0.8757 

0.9678 

0.7583 

0.8757 

0.9678 

0.8715 

0.6160 

0.5745 

ZFCM 

0.8176 

0.9082 

0.9765 

0.8176 

0.9082 

0.9765 

0.9035 

0.4653 

0.6022 

MECM 

0.8232 

0.9082 

0.9771 

0.9303 

0.8681 

0.9843 

0.9042 

0.4625 

0.5995 

BGLL 

0.7512 

0.9120 

0.9689 

0.7512 

0.9120 

0.9689 

0.8903 

0.5195 

0.6046 

LPA 

0.6698 

0.8298 

0.9538 

0.6698 

0.8298 

0.9538 

0.8623 

0.6580 

0.5757 


Table 5: The results for Dolphins network by different methods 


Precision 

Recall 

RI 

EP 

ER 

ERI 

NMI 

VI Modularity 

MCM 

1 

1 

1 

1 

1 

1 

1 

0 

0.3787 

MFCM 

1 

1 

1 

1 

1 

1 

1 

0 

0.3787 

ZFCM 

1 

1 

1 

1 

1 

1 

1 

0 

0.3787 

MECM 

1 

1 

1 

1 

1 

1 

1 

0 

0.3787 

BGLL 

0.9271 

0.3583 

0.6351 

0.9271 

0.3583 

0.6351 

0.4617 

1.1784 

0.5185 

LPA 

0.9250 

0.5029 

0.7070 

0.9250 

0.5029 

0.7070 

0.5595 

0.8354 

0.5070 




Table 6: 

The results for Lesmis network by different methods 




Precision 

Recall 

RI 

EP 

ER 

ERI 

NMI 

VI 

Modularity 

MCM 

0.6109 

0.5522 

0.9005 

0.6109 

0.5522 

0.9005 

0.7381 

1.1295 

0.4732 

MFCM 

0.5774 

0.6456 

0.8971 

0.5774 

0.6456 

0.8971 

0.7743 

0.9555 

0.4705 

ZFCM 

0.7368 

0.5769 

0.9217 

0.7368 

0.5769 

0.9217 

0.7805 

0.9666 

0.4983 

MECM 

0.7065 

0.7473 

0.9299 

0.9298 

0.4368 

0.9258 

0.7977 

0.8531 

0.4884 

BGLL 

0.5796 

0.8104 

0.9033 

0.5796 

0.8104 

0.9033 

0.7551 

0.9435 

0.5556 

LPA 

0.4594 

0.9643 

0.8544 

0.4594 

0.9643 

0.8544 

0.7500 

0.8637 

0.5428 


5.5. Discussion 

We will discuss for which application MECM is designed here. As analysed before, for MECM 
only dissimilarities between objects are required and only the intuitive assumptions need to be 
satisfied for the dissimilarity measure. Therefore, the algorithm could be appropriate for many 
clustering tasks for non-nretric data objects. This type of data is very common in social sciences, 
psychology, etc, where any metric assumptions about the similarities/dissimilarities could not be 
assured. The freedom for the data set leads to the restriction that the prototypes should be the 
objects themselves. Nevertheless, this constraint seems reasonable for social networks as the center 
of a community is usually the person (node) frequently contacting with others. Thus the approach 


These data sets can be found in http://networkdata.ics.uci.edu/index.php 
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Table 7: The results for Political books network by different methods 



Precision 

Recall 

RI 

EP 

ER 

ERI 

NMI 

VI 

Modularity 

MCM 

0.8109 

0.8030 

0.8482 

0.8109 

0.8030 

0.8482 

0.5721 

0.8426 

0.4979 

MFCM 

0.8020 

0.8187 

0.8485 

0.8020 

0.8187 

0.8485 

0.5755 

0.8256 

0.4962 

ZFCM 

0.7928 

0.7487 

0.8234 

0.7928 

0.7487 

0.8234 

0.5301 

0.9407 

0.5048 

MECM 

0.7880 

0.8081 

0.8383 

0.8458 

0.6435 

0.8128 

0.5755 

0.8247 

0.4725 

BGLL 

0.8244 

0.6203 

0.7978 

0.8244 

0.6203 

0.7978 

0.5121 

1.0987 

0.5205 

LPA 

0.7331 

0.8558 

0.8200 

0.7331 

0.8558 

0.8200 

0.5612 

0.7925 

0.4604 


can be applied to community detection problems. Thanks to the introduction of imprecise classes, 
it could reduce the risk of partitioning the objects which we are uncertain or ignorant into an 
incorrect cluster. For this reason the algorithm can help us make soft decisions when clustering 
the data set without distinct cluster/community structures or with overlap. 

Due to the computational complexity, the proposed algorithm is not well directly adapted 
to handle very large data sets. However, here we discuss the possibility to apply the evidential 
community detection approach to large-scale networks. Firstly, the number of parameters to be 
optimized is exponential and depends on the number of clusters [32]. For the number of classes 
larger than 10, calculations are not tractable. But we can consider only a subclass with a limited 
number of focal sets [32]. For instance, we could constrain the focal sets to be composed of at most 
two classes (except f2). Secondly, for the network with millions of nodes, MCM or MFCM could be 
evoked as a first step to merge some nodes into small clusters. After that we can apply MECM to 
the “coarsened” network. But how to define the edges or connections of the new graph should be 
studied. Lastly we emphasize that the evidential community detection algorithm could be utilised 
for gaining a better insight into the network structure and detecting the imprecise classes. For the 
large-scale network, it is difficult to make specific decisions for all of nodes due to the limitation 
of time, money or techniques. In this case we can use the proposed approach to make some “soft” 
decisions first and then use some techniques special for the imprecise parts of the graph. 


6. Conclusion 

We introduced a Median variant of Evidential C-means (MECM) as a new prototype-based 
clustering algorithm in the present contribution. The proposed approach is an extension of median 
c-means and median fuzzy c-means. It is based on the framework of belief function theory. The 
applied median-based clustering requires the definition of the dissimilarity between the objects only. 
Therefore, it is not restricted to a metric space application. The prototypes of the clusters are 
constrained to the data objects themselves. MECM provides us with not only credal partitions but 
also hard and fuzzy partitions as by-products through computing pignistic probabilities. Moreover, 
it could distinguish ignorance from uncertainty while the fuzzy or crisp partitions could not. By 
the introduced imprecise clusters, we could find some overlapped and indistinguishable clusters for 
related nodes. Thanks to the advantages of belief function theory and median clustering, MECM 
could be applied to community detection problems in social networks. As other median clustering 
approaches, MECM tends to get stuck in local minima such that several runs have to be performed 
to obtain good performance. However, we propose an initial prototype-selection scheme using 
the evidential semi-centrality for the application of MECM in community detection to solve the 
problems brought by the initial prototypes. Results of presented experiments on artificial and real- 
world networks show that the credal partitions on graphs provided by MECM application are more 
refined than crisp and fuzzy ones. Therefore, they could enable us to gain a better understanding 
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of analysed community structure. Some examples on the classical metric space are also given to 
illustrate the interest of MECM and to show its difference with respect to the existing methods. 

As mentioned in this paper, there may be more than one center in each community network. 
Nevertheless, we ignore “multi-center” to avoid the troubles brought by the need for an initial seed 
using ESC and the definition of a threshold to control the distance between prototypes. We are 
aware that this is a drawback of the presented approach as not all the centers in each community 
are taken into consideration. Therefore, we intend to include the feature of multiple centers in our 
future research work. 
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